Building up the case for time-dependant visualizations

The problem statement

These examples are re-used from section 2.6.5 of https://ggplot2-book.org/getting-started#sec-line.

The dataset called economics from the ggplot2 package, has economic data on the US measured over the last 40 years up until 2010.

Here is a brief look at the first 5 out of 574 rows of the dataframe economics.

data <- head(economics, n=5)
knitr::kable(data)
date pce pop psavert uempmed unemploy
1967-07-01 506.7 198712 12.6 4.5 2944
1967-08-01 509.8 198911 12.6 4.7 2945
1967-09-01 515.6 199113 11.9 4.6 2958
1967-10-01 512.2 199311 12.9 4.9 3143
1967-11-01 517.4 199498 12.8 4.7 3066

Let’s first make a simple time series plot of the unemployment rate. This is a continuous variable that is computed with the ratio unemploy / pop.

In ggplot2 a frame defines the first mapping from variables to a space where the data will be represented as glyphs. It is created with the function aes(). The obvious frame for this plot is defined by the two variables date and unemploy / pop. They are mapped to the x and y coordinates of a frame. The glyphs drawn over this frame will be lines between the data points located in the frame, they are created with the function geom_line().

ggplot(data = economics, mapping = aes(x = date, y = unemploy / pop)) +
  geom_line()

Technically speaking unemploy / pop represents the “population rate of unemployment as a fraction of the population able to work that is unemployed”, (https://www.bls.gov/cps/cps_htgm.htm#definitions)

Another variable called uempmed from this same dataset tracks the unemployment length measured as the median number of weeks a person lasts unemployed.

ggplot(economics, aes(date, uempmed)) +
  geom_line()

From these two plots one can reason about the recent trend towards longer median unemployment time in the decade of 2010. There are also cycles of between 5 and 10 years of peak unemployement rates.

An interesting question is how these two time series correlate over time. In ggplot2, the frame for this new representation can be defined by a mapping of each variable to the x and y coordinate of the plane. The glyphs are of two kinds, the variables are represented via the geom_point, while their sequential trajectory as they appear in the dataset, ordered by time, is captured by the mapping called geom_path. The figure below shows such a graph.

ggplot(economics, aes(unemploy / pop, uempmed)) + 
  geom_path() +
  geom_point()

It is hard to understand the direction of the flow of time from the lines alone. For example, it is difficult to visualize where the first, the last, or any years in between have happened.

This can be addressed by adding another mapping from the property colour to the variable year in the geom_point mapping. This is done with a default colour scale chosen by R.

year <- function(x) as.POSIXlt(x)$year + 1900
ggplot(economics, aes(unemploy / pop, uempmed)) + 
  geom_path(colour = "grey50") +
  geom_point(aes(colour = year(date)))

The colour property in the geom_path is a mapping that gives each line created between points a unique value indicated by the colour specification “grey50”. It is a many to one mapping, instead of the many to many mapping of the geom_point.

The animation that helps explain the flow of time for the two variables, unemployment rate and median unemployment length in weeks is shown below using the package plotly and the additional mapping from the property frame to the variable date.

library(plotly)
year <- function(x) as.POSIXlt(x)$year + 1900
p <- ggplot(economics, aes(unemploy / pop, uempmed)) + 
  geom_path(colour = "grey75") +
  geom_point(aes(colour = year(date), frame = year(date)))
Warning in geom_point(aes(colour = year(date), frame = year(date))): Ignoring
unknown aesthetics: frame
fig <- ggplotly(p)

fig <- fig %>% 
  animation_opts(1000, 
                 easing = "elastic", 
                 redraw = FALSE )

fig <- fig %>% 
  animation_button(x = 1, 
                   xanchor = "right",
                   y = 0, 
                   yanchor = "bottom")

fig <- fig %>%
  animation_slider(
    currentvalue = list(prefix = "YEAR ",
                        font = list(color="red")))
fig